Applying active learning to high-throughput phenotyping algorithms for electronic health records data.

نویسندگان

Yukun Chen

Robert J Carroll

Eugenia R McPeek Hinz

Anushi Shah

Anne E Eyler

Joshua C Denny

Hua Xu

چکیده

OBJECTIVES Generalizable, high-throughput phenotyping methods based on supervised machine learning (ML) algorithms could significantly accelerate the use of electronic health records data for clinical and translational research. However, they often require large numbers of annotated samples, which are costly and time-consuming to review. We investigated the use of active learning (AL) in ML-based phenotyping algorithms. METHODS We integrated an uncertainty sampling AL approach with support vector machines-based phenotyping algorithms and evaluated its performance using three annotated disease cohorts including rheumatoid arthritis (RA), colorectal cancer (CRC), and venous thromboembolism (VTE). We investigated performance using two types of feature sets: unrefined features, which contained at least all clinical concepts extracted from notes and billing codes; and a smaller set of refined features selected by domain experts. The performance of the AL was compared with a passive learning (PL) approach based on random sampling. RESULTS Our evaluation showed that AL outperformed PL on three phenotyping tasks. When unrefined features were used in the RA and CRC tasks, AL reduced the number of annotated samples required to achieve an area under the curve (AUC) score of 0.95 by 68% and 23%, respectively. AL also achieved a reduction of 68% for VTE with an optimal AUC of 0.70 using refined features. As expected, refined features improved the performance of phenotyping classifiers and required fewer annotated samples. CONCLUSIONS This study demonstrated that AL can be useful in ML-based phenotyping methods. Moreover, AL and feature engineering based on domain knowledge could be combined to develop efficient and generalizable phenotyping methods.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Toward high-throughput phenotyping: unbiased automated feature extraction and selection from knowledge sources

OBJECTIVE Analysis of narrative (text) data from electronic health records (EHRs) can improve population-scale phenotyping for clinical and genetic research. Currently, selection of text features for phenotyping algorithms is slow and laborious, requiring extensive and iterative involvement by domain experts. This paper introduces a method to develop phenotyping algorithms in an unbiased manner...

متن کامل

Electronic health records-driven phenotyping: challenges, recent advances, and perspectives.

With the completion of the Human Genome Project as well as recent advances in genomic science and comparative biological studies, a new era of individualized medicine is evolving where novel biomedical discoveries are leading to more effective prevention, treatment, and diagnosis of disease. Although altered phenotypes are one of the most reliable manifestations of altered gene functions, resea...

متن کامل

Electronic phenotyping with APHRODITE and the Observational Health Sciences and Informatics (OHDSI) data network

The widespread usage of electronic health records (EHRs) for clinical research has produced multiple electronic phenotyping approaches. Methods for electronic phenotyping range from those needing extensive specialized medical expert supervision to those based on semi-supervised learning techniques. We present Automated PHenotype Routine for Observational Definition, Identification, Training and...

متن کامل

Using Association Rule Mining for Phenotype Extraction from Electronic Health Records

The increasing adoption of electronic health records (EHRs) due to Meaningful Use is providing unprecedented opportunities to enable secondary use of EHR data. Significant emphasis is being given to the development of algorithms and methods for phenotype extraction from EHRs to facilitate population-based studies for clinical and translational research. While preliminary work has shown demonstr...

متن کامل

Computational Methods for Electronic Health Record-driven Phenotyping

Each year the National Institute of Health spends over 12 billion dollars on patient related medical research. Accurately classifying patients into categories representing disease, exposures, or other medical conditions important to a study is critical when conducting patientrelated research. Without rigorous characterization of patients, also referred to as phenotyping, relationships between e...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

Journal of the American Medical Informatics Association : JAMIA

دوره 20 e2 شماره

صفحات -

تاریخ انتشار 2013

Applying active learning to high-throughput phenotyping algorithms for electronic health records data.

نویسندگان

چکیده

منابع مشابه

Toward high-throughput phenotyping: unbiased automated feature extraction and selection from knowledge sources

Electronic health records-driven phenotyping: challenges, recent advances, and perspectives.

Electronic phenotyping with APHRODITE and the Observational Health Sciences and Informatics (OHDSI) data network

Using Association Rule Mining for Phenotype Extraction from Electronic Health Records

Computational Methods for Electronic Health Record-driven Phenotyping

عنوان ژورنال:

اشتراک گذاری